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Abstract. Quasi-Monte Carlo (QMC) rules 1/N J 2 n=o 
can be used to approximate integrals of the form ^ f(yA) d y, 

where A is a matrix and y is row vector. This type of integral 
arises for example from the simulation of a normal distribution 
with a general covariance matrix, from the approximation of the 
expectation value of solutions of PDEs with random coefficients, 
or from applications from statistics. In this paper we design QMC 
quadrature points y Q , ..., y^-i € [0, l] s such that for the matrix 
Y = ( y q", ... ,yJ f _ 1 ) T whose rows are the quadrature points, one 
can use the fast Fourier transform to compute the matrix-vector 
product Ya r , a £ K s , in 0(N log N) operations and at most s — 1 
extra additions. The proposed method can be applied to lattice 
rules, polynomial lattice rules and a certain type of Korobov p-set. 

The approach is illustrated computationally by three numerical 
experiments. The first test considers the generation of points with 
normal distribution and general covariance matrix, the second test 
applies QMC to high-dimensional, affine-parametric, elliptic par¬ 
tial differential equations with uniformly distributed random coef¬ 
ficients, and the third test addresses Finite-Element discretizations 
of elliptic partial differential equations with high-dimensional, log¬ 
normal random input data. All numerical tests show a significant 
speed-up of the computation times of the fast QMC matrix method 
compared to a conventional implementation as the dimension be¬ 
comes large. 

Key words: Quasi-Monte Carlo, fast Fourier transform, lattice rule, 
polynomial lattice rule, Korobov p-set, high-dimensional integration, 


partial differential equations with random input. 
MSC Class: 65C05 


1. Introduction 


We are interested in numerical approximations of integrals of the 


form 


( 1 ) 



l 


2 JOSEF DICK, FRANCES Y. KUO, QUOC T. LE GIA, CHRISTOPH SCHWAB 

where the parameter domain U is a subset of /i is a probability 
measure on U, y is a 1 x s row vector, and A is an s x t real matrix. 
Often we have t = s, but there are also instances where t is much 
larger than s, see Section [3] below. We approximate these integrals by 
equal-weight quadrature rules 

JV-l 

(2) j; £ f(y«A), 

71=0 

where y Q ,..., y N _ 1 £ U are quadrature points which are expressed 
again as row vectors (using row vectors merely simplifies our notation 
later on, it is not a necessity). We are interested in cases where the 
computation of y n A for n = 0,..., N — 1 is a significant factor in the 
computation of (J2J) and where N is significantly smaller than 2 s (say 
N ~ s K for some k > 0). The condition IV <C 2 s is often naturally 
satisfied: for instance if the dimension s is very large, say s > 100, 
then the number of points N which can be used on current computers 
is much smaller than 2 100 ss 10 30 ; another instance arises for example if 
the dimension is derived from a discretization or approximation scheme 
where one needs to increase the dimension s together with N in order 
to reduce the discretization or approximation error in a way such that 
JV < 2 s . Examples where such situations arise naturally are given in 
Section [3j 

Returning to the approximation of (ITT) bv 121) . one concrete example 
of our setting arises from taking the expectation of some quantity of 
interest with respect to the multivariate normal density with a general 
covariance matrix £ £ M sxs , 

f 

J(2tt) s det(£) 

Using a factorization £ = A T A together with the substitution z = yA, 
we arrive at the integral (IT]) , with U = M s , s = t, and with /j being the 
standard product Gaussian measure. (If the mean for the multivariate 
normal density is nonzero then a translation should be included in the 
substitution, but the general principle remains the same.) The method 
(PD can be interpreted as the simple Monte Carlo approximation with 
y 0 ,, y N _i £ being i.i.d. standard Gaussian random vectors, and 
the computation of y n A for n = 0,..., IV — 1 can be interpreted as 
generating normally distributed points in with the given covariance 
matrix £. The method ([2]) can also be interpreted as a quasi-Monte 
Carlo (QMC) approximation with y n = <h _1 (a; n ) for n = 0,..., IV — 1, 
where x 0 , ■ ■ ■ , x N _ 1 £ [0, l] s are deterministic QMC sample points, 
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and where : [0,1] —>■ M denotes the inverse of the standard normal 
distribution function and is applied component-wise to a vector. We 
will consider this example in Subsection 13.11 

Several papers studied how one can obtain matrices A in special 
circumstances which allow a fast matrix-vector multiplication. In the 
context of generating Brownian paths in mathematical finance, it is 
well known that the “standard construction” (corresponding to the 
Cholesky factorization of £) and the “Brownian Bridge construction” 
can be done in 0(s ) operations without explicitly carrying out the 
matrix-vector multiplication (see e.g., Em), while the “principal com¬ 
ponents construction” (corresponding to the eigenvalue decomposition 
of £) can sometimes be carried out using Discrete Sine Transform in 
0(s logs) operations (see e.g., [HUES]); other fast orthogonal trans¬ 
form strategies can also be used (see e.g., [18]). In the context of 
PDEs with random coefficients, it is known that circulant embedding 
techniques can be applied in the generation of stationary Gaussian ran¬ 
dom fields so that Fast Fourier Transform (FFT) can be used (see e.g., 

mm)- 

The approach in this paper differs from all of the above in that we do 
not try to modify the matrix A, but rather modify the quadrature points 
y 0 ,..., y N _i to reduce the cost of computing the N matrix-vector prod¬ 
ucts y n A for n = 0,..., N — 1. This implies that we do not require any 
structure in the matrix A, and so our approach is applicable in general 
circumstances. For example, in some finance problems the payoff de¬ 
pends not only on the Brownian paths but also on a basket of assets; 
our approach can be used to speed up the matrix-vector multiplications 
with a factorization of the covariance matrix among the assets. Another 
example arises from the maximum likelihood estimation of generalised 
response models in statistics: the change of variables strategy proposed 
in [16] requires one to numerically compute the stationary point of 
the exponent in the likelihood integrand function and the correspond¬ 
ing quadratic term in the multivariate Taylor expansion; our approach 
can be used to speed up the matrix-vector multiplications with a fac¬ 
torization of this numerically computed Hessian matrix. Yet another 
important example arises from parametric PDEs on high-dimensional 
parameter spaces, which appear in computational uncertainty quan¬ 
tification; the presently proposed approach can be used for both the 
so-called “uniform” and “log-normal” inputs (see e.g., pananz]). we 
will consider these PDE applications in Subsections 13.21 and 13.31 
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To explain the idea behind our approach, we introduce the matrix 



and we want to have a fast method to compute 



To compute (J2]), we propose to first compute the product B = YA, 
store the matrix 5, and then evaluate 


1 N - 1 

ivE/(»»)• 

n=0 


Thus this method requires 0(Nt ) storage. In general, the computation 
of YA requires O(Nst) operations, and the quadrature sum requires 
0(N) operations. In the following we construct quadrature points 
y 0 ,..., y N _i for which the matrix Y permits a matrix-vector multipli¬ 
cation Ya in 0(N log N) operations, where a can be any column of 
the matrix A. The computation of YA then reduces to 0(t N log N) 
operations, instead of O(Nst) operations for the straight forward im¬ 
plementation. This leads to significant speedup provided that N is 
much smaller than 2 s . 

The basic idea of the proposed approach is to find quadrature point 
sets {y 0 ,..., y N _ i} G R s with a specific ordering such that the matrix 



has a factorization of the form 


where Z G Rfw-ilxpv-i) is a circulant matrix and P G {0,1}( 7V_1 ) XS is a 
matrix in which each column has at most one value which is 1 and with 
the remaining entries being 0. The special structure means that, for 
a given column vector a, the column vector a' = Pa can be obtained 
in at most 0(N) operations, and the matrix-vector multiplication Za! 
can be computed in 0(N log N) operations using FFT. On the other 
hand, the computation of y 0 a requires at most s — 1 additions and 1 
multiplication. The vector y 0 is separated out because typical QMC 


FAST QMC MATRIX-VECTOR MULTIPLICATION 


5 


methods would lead to y 0 being a constant vector. If y 0 = (0,..., 0), 
then no extra computation is necessary. 

In Section [2] we consider two important classes of QMC point sets 
whose structure facilitates the use of the presently proposed accelera¬ 
tion: 

• point sets derived from lattice rules, 

• the union of all Korobov lattice point sets (which is one class 
of “Korobov p-sets”). 

The same strategy can be applied also to polynomial lattice rules 
where the modulus is a primitive polynomial over a finite field F& of 
order b, and to the union of all Korobov polynomial lattice rules. 

Note that lattice rules and polynomial lattice rules can yield a con¬ 
vergence rate close to 0{N ~ l ) for sufficiently smooth integrands, with 
the implied constant independent of the integration-dimension s un¬ 
der appropriate conditions on the integrand function and the underly¬ 
ing function space setting, see e.g., [3j. The union of Korobov lattice 
point sets on the other hand achieves a convergence rate of 0(N~ 1 ^ 2 ) 
for a much larger class of functions, and dimension independent error 
bounds can be obtained with significantly weaker assumptions EW 
Thus, when the integrand is not smooth enough for lattice rules and 
polynomial lattice rules, the union of Korobov lattice point sets can be 
a good substitute for the simple Monte Carlo method so that the fast 
computation approach of this paper can be exploited. 

To illustrate our method and to investigate numerically for which 
parameter ranges the improvements in the computational cost are vis¬ 
ible, we consider three applications in Section [3j In Subsection 13.11 we 
generate normally distributed points with a general covariance matrix. 
In Subsections 13.21 and 13.31 we consider PDEs with random coefficients 
in the uniform case and log-normal case, respectively. The numeri¬ 
cal results in Section [4] show that our method is significantly faster 
whenever the dimension becomes large. 

2. Fast QMC matrix-vector multiplication 

We explain the fast method for lattice point sets and the union of all 
Korobov lattice point sets. The basic idea also applies to polynomial 
lattice point sets and the union of all Korobov polynomial lattice point 
sets. 

2.1. Fast matrix-vector multiplication for lattice point sets. 

Our approach is very similar to the method used in |I9] for the fast 
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component-by-component construction of the generating vector for lat¬ 
tice rules, however, we apply it now to the matrix vector multiplication 
Ya rather then the component-by-component construction. For sim¬ 
plicity, we confine the exposition to cases where the number of points 
is a prime. Based on the presently developed ideas, the general case 
can be handled analogously with the method from [ 20 ] . 

Let N be a prime number, let Zjy = {0,1,..., N — 1}, and let Z* N = 
{1,2,...,TV-1}. 

A lattice point set with generator (gi, g 2 , ... ,g s ) G (%* N ) S is of the 
form 


({?}•{?} 



for n = 0,1,..., N — 1, 


where for nonnegative real numbers x we denote by {x} = x — |_xj the 
fractional part of x. 

Let (3 be a primitive element of the multiplicative group 7L* Nl i.e., 
we have {/ 3 k mod N : k = 1, 2 ,..., N - 1 } = Z* N . As is well-known 
P N ~ l = (3° = 1 mod N. Moreover, its multiplicative inverse /3 _1 e TL* N 
is also a primitive element. We write each component of the generating 
vector ( 0 i, 0 2 ... , 0 a ) as 


gj = (3 Cj 1 mod N for some 1 < Cj < N — 1. 


Note that the fast component-by-component algorithm of [T9] for con¬ 
structing the generating vector computes the values c 3 as a by-product, 
and hence no additional computation is needed to obtain the values c 3 
in this case. 

Clearly, the ordering of the QMC points does not affect the quadra¬ 
ture sum. We now specify a particular (unconventional) ordering which 
allows fast matrix-vector multiplications. We define a?o = (0, - - -, 0), 
and for n = 1 , 2 ,..., Af — 1 we define 

f P~ (n - 1) g 2 } \ (3~^g s \ 

Xn vi n i’l n /’"■’! n f 

f } f \ f /3 _ ( Tl_1 )/3 Cs_1 

{ N J N J ’\ N 

(\P cl ~ n \ f P C2 ~ n \ f P Cs ~ n \\ 




In essence, we have changed the ordering by substituting the conven¬ 
tional index n with / d _< - n_1 ' ) and replacing each generating vector com¬ 
ponent gj by /3 Cj _1 . 

The quadrature points we consider in (J 2 j) are now given by 


Vn = <^On) 
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= (<p(x n>1 ), ip(x nt2 ), • • •, y{x n , s )) for n = 0,1,. .., TV - 1, 

where w;e apply the same univariate transformation ip : [0,1] —> R to ev¬ 
ery component of every point. One example for such a transformation 
is <p{x) = $ _1 (:r), the inverse of the cumulative normal distribution 
function; this maps the points from [0, l] s to as we already dis¬ 
cussed in the introduction. Another example is <p{x) = 1 — \2x — 1|, 
the tent transform; results for lattice rules usually apply to periodic 
functions, applying the tent transform yields similar results for non¬ 
periodic functions, see [4j|. The case where <p(x) = x is included as a 
special case. 

We discuss now the multiplication of the matrix Y with a column 
vector a e R s . Since y 0 = ( 95 ( 0 ), <^(0),..., <^(0)) we have 


S 



In particular, if ip is the identity mapping then y 0 a = 0. Thus the 
first component can be computed using at most s — 1 additions and 1 
multiplication. We consider now the remaining matrix 



In the following we show that Y' can be written as a product of a 
circulant matrix Z and a matrix P in which N — 1 entries are 1 and 
the remaining entries are 0. 

Recall that (3 is a primitive element of JZ N . For k G Z let 



Then we have Zk = Zk+e(N-i) f° r all £ £ Z. Let 


/ z 0 z 1 z 2 

Zn~ 2 Zo Zi 


ZN -3 Zn- 2\ 
ZN -3 


ZN -3 Zn -2 Zq 


z 2 

\ Zi z 2 


Zo z 1 

Zn~ 2 Z 0 ) 
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We define the matrix P = {p k ,j)i<k<N-i,i<j< s e {0, 1} {N i)xs by 


Pk,j 


1 if k = Cj, 

0 otherwise. 


Each column of the matrix P contains exactly one element 1, with the 
remaining elements being 0. ft is now elementary to check that 

(3) Y' = ZP. 


Note that the matrix Z is exactly the same as the matrix used in the 
fast component-by-component algorithm of [T9]. In effect, the matrix 
P specifies which columns of Z to select (namely, the ci-th, the C 2 -th, 
..., and the c. s -th) to recover Y'. 

Let a 6 R s be any column vector. Then a' = Pa can be obtained 
in at most 0(N ) operations due to the special structure of P, and 
the matrix-vector multiplication Za! can be computed in 0(N log N) 
operations using FFT (see i) since Z is circulant. Thus Y'a can 
be computed in 0(N log N) operations, and hence the matrix-vector 
multiplication Ya can be carried out using 0(N log N) operations plus 
at most s — 1 additions. 

We remark that the formula (J3J) can also be used to generate the 
matrix Y', i.e., to generate the quadrature points in a fast way. Further, 
if one wants to store the point set, i.e., matrix Y, one can simply store 
the primitive root /3 and the s numbers c±,... ,c s . 

The case where N is not a prime number can be treated as in [ 20] . 

We finish this subsection with a simple example to illustrate the idea. 


Example 1. Let s = 3, N = 7, and (g 1 , g 2 , g^) = (1,5,3). A primitive 
root for Zjj is (3 = 3, with multiplicative inverse /3 _1 = 5. We have 

gi = 1 = 3 1_1 mod 7 =>- C\ = 1, 

c /2 = 5 = 3 6-1 mod 7 =>- c 2 = 6, 


(? 3 = 3 = 3 2 1 mod 7 


C3 — 2. 


The conventional ordering of the points and the new ordering are 

' Xq = ( 0 , 0 , 0 ), 

*2 = ({* 

*3 = ({ 

*4 = ({* 


f (0,0,0), 

1 5 3\ 

7; 7; j)i 

2 3 6 \ 
7 1 7 1 7 )i 

3 12 \ 
7 j 7 j 7/) 

4 6 5\ 

7 ! 7 J 7/> 

5 4 1\ 
7j 7 1 7/) 

6 2 4\ 
V. V7’ 7’ 7/’ 




versus 


u 

7 )’ { 7 


7 J ’ L 7 
3 1_3 1 r 3 6-3 


7 J 7 V7’ 7’ 1> 

36-2, ,32-2^ /5 4 1\ 

J ? X 7 J ) \ 7 ? 7? 7/5 

a 2 - 3 - 


},{¥» = ( 


},{^p},{^i}) = (f 

*5 = ({^},{^},{^}) = ( 

l* 6 =({^},{^},{^}) = (f,i?) 


7’ 7’ 7 
4 6 5 
7’ 7’ 7 
6 2 4 
7’ 7’ 7 

2 3 6 
7’ 7’ 7 

3 12 
7’ 7’ 7 
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It is easy to see that indeed 




T(§) ¥>(?) ¥>(§) ¥>(f) ¥>(f)\ 


/i 0 o\ 

<p(x 2 ) 


¥>(f) <P$) ¥>(?) ¥>(?) ¥>(?) ¥>(?) 


0 0 1 



¥>(f) ¥>(f) ¥>(*) ¥>(?) ¥>(?) ¥>(?) 


000 



¥>(?) <P(i) ¥>(f) <P{\) ¥>(?) ¥>(?) 


000 

t( x 5 ) 


v>(?) t(§) t(I) v>(i) v>(?) 


000 

Y> 


{t(D ¥>(?) ¥>(!) ¥>(f) ¥>(f) 

z 


1 ° 1 0/ 

p 


The matrix P specifies that we select the first, the sixth, and the second 
columns of Z, as indicated by the values of c\,oi,c-$, to recover Y'. 

Remark 1. The method discussed above works in the same way for 
polynomial lattice rules over a finite field F b of order b. 

Remark 2. The method does not work when we apply general random¬ 
ization techniques such as “shifting” for lattice rules or “scrambling” 
for polynomial lattice rules. This is because the corresponding trans¬ 
formation (p in the mapping y n = <p(x n ) fails to be the same mapping 
in all coordinate directions. If we were to restrict all random shifts to 
be of the form A = (A,..., A) e [0, l] s in the case of lattice rules then 
the method would work. 


Remark 3. Higher order polynomial lattice rules, which have been in¬ 
troduced in [5j, also fit into the structure used in this subsection since 
they can be viewed as the first b m points of a polynomial lattice point 
sets with b ma points, where a G N denotes the smoothness. Here a = 1 
corresponds to the classical polynomial lattice rules. However, if we 
use the method from this paper, then the matrix vector multiplication 
uses the full b ma points, which means the matrix vector multiplica¬ 
tion requires 0(b ma ma) operations, instead of 0{b m s ) operations for 
a straightforward implementation. Thus this method is only advanta¬ 
geous if b ma ma -C b m s. For a > 2 this implies that b m -C s, which 
usually does not hold. 


2.2. The union of all Korobov lattice point sets. Hua and Wang [D2, 
Section 4.3] studied the point set 


ng 

~K 


ng 

K 


ng 


s —1 


K 


for n,g = 1,2 


, K — 1, 


where K is a prime number. The number of points is N = (K — l) 2 . 
This is essentially the union of all Korobov lattice point sets. (Note 
that Hua and Wang also included the cases n = 0 or g = 0, or both 
n — g — 0, but these only yield the zero vector.) 
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This point set achieves only a rate of convergence of the weighted 
star-discrepancy of 0(N~ 1 ' 2+S ) for any S > 0, however, the dependence 
on the dimension of the weighted star-discrepancy is better than what 
is known for lattice point sets or polynomial lattice point sets in some 
circumstances, see [6] for more details. 

We now specify a particular ordering of the points to allow fast 
matrix-vector multiplications. Let /? be a primitive element in 7L* K . 
As in Subsection 12.11 we replace the index n in the conventional order¬ 
ing by /3~ < ' n ~ 1 \ and similarly we replace the index g by That is, 

for n,g = 1, 2,..., K — 1, we define 


•Tn,g 


f 3 -(n-l)pO(g-l)r "i f ^-(n-l)^2(g-l) 


K 


K 


K 


^-(ra-l)^(s-l)(g-l) 


K 


(3 c 9, l- n 1 ( pcg,2-n 'j r [jCg,3-n. "| ( ^g,s 

1 • • • 1 


K 


K 


K 


K 


with 


c 9,j = U ~ !)(0 - !) + 1 mod ( K - !) for 3 = 1, • ■ • t s. 


We also define 

Un,g = ( Pi. x ri,g')- 

Finally we define the matrix 


(4) Y' = 

( ^ 

, with Y' = 

( Vl ' 9 \ 

V2,g 

for g = 1,. . ., K - 1 


\Yk-J 


\yK-l,g/ 



For the matrices Y' we can apply the method from Subsection 12.11 
to write it as = ZP g using the values of c Sj i,..., c S)S so that a 
matrix-vector multiplication can be computed in at most 0(K log K) 
operations. Thus one matrix-vector product for the matrix Y’ can be 
evaluated in 0(N log N) operations. 


Remark 4. The same strategy can be applied to the union of all Ko¬ 
robov polynomial lattice point sets. 


3. Applications 

3.1. Generation of normally distributed points with general 
covariance matrix. In many applications one requires realizations of 
random variables in which are normally distributed A f(fi, £) with 
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mean y = (/ii, / 12 ,..., /i s ) and covariance matrix £ G R SX L An al¬ 
gorithm to generate such random variables is described for instance 
in m Section 11.1.6] and works the following way. Let A G R sxs 
be such that A T A = £; for example, A can be the upper triangu¬ 
lar matrix in the Cholesky decomposition of £. To generate a point 
(Zi,Z 2 ,...,Z,) rs./ A/"(/x, £), one generates i.i.d. standard normal ran¬ 
dom variables Y \, Y 2 ,..., Y s ~ A/"(0,1) with mean 0 and variance 1 and 
then computes (Z u Z 2} ..., Z s ) = (Yi, Y 2 ,..., Y S )A + (/q, fi 2 ,. ■ ■, /a,). 

As we already outlined in the introduction, this procedure can be 
implemented in the following way using deterministic QMC point sets. 
Let aio; £Ci, • •., x N _i G [0, l] s be a set of quadrature points as described 
in Section [2] Let <f> _1 be the inverse of the cumulative normal distri¬ 
bution function. Set 

Vn = 

and compute 

(5) = y n A + y. 

Note that we do not assume any structure in the matrix A. This is 
contrary to a number of scenarios in e.g., mathematical finance; see 
our discussion in the introduction. 

3.2. Partial differential equations with “uniform” random co¬ 
efficients. The matrix-vector multiplication also arises in applications 
of QMC for approximating linear functionals of solutions of PDEs with 
random coefficients, see e.g., S3- 

A prototypical class of countably parametric, elliptic boundary value 
problems reads as 

(6) -V • {a{x,y)\7u(x,y)) = g(x), x G D c R d , y G [-|, |] N , 

(7) u(x, y) = 0, x G dD , 

where x is a vector in the convex, physical domain D C M. d , and y = 
( 2 / 1 , y 2 ,...) is a parametric sequence in [— |, |] N , with yj being uniformly 
distributed on [—|, |], Here, we distinguish notationally between a 
vector x in the spatial domain D and a vector y in the parametric 
domain [— |] N . The coefficient a(x,y) depends on the parameter 

sequence y in an affine manner, ie. 

OO 

(8) a(x, y) = -0o( t) + yj 0 j(x ) 

j =1 

In (J8]), the sequence of functions Vb(x) for j > 1 is assumed to decay as 
j —> 00 such that ||0j||ioo( £) ) < cxo for some 0 < p < 1 and that 
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E s >i IIViM l°°(d) < oo. For more background, necessary assumptions 
and theoretical results see El 

In JIT] the aim is to approximate the expected value E(G(w)) with 
respect to the random sequence y of a linear functional G of the solution 
u of the PDE. The algorithm truncates the infinite sum in ([8]) after s 
terms (i.e. set yj = 0 for j > s), solves the truncated problem using a 
(piecewise linear) finite element method with M mesh points, and then 
approximates the s-dimensional integral using an iV-point QMC rule. 
The resulting three sources of errors, namely, the truncation error, the 
finite element error, and the quadrature error, need to be balanced. For 
instance, in the case that Xqli IIVhllJuxD) < 00 an d th a f G G L 2 (D), 
Em Theorem 8.1] yields for continuous, piecewise linear Finite Element 
discretizations of ([6]) - ([8]) in D on quasiuniform meshes that we should 
choose 

(9) N x s x M 2/d , 

where x indicates that the terms should be of the same order. In 
general we have N x s K for some small k, > 0 (i.e., N -C 2 s ). Thus 
the fast method of this paper can be advantageous (see the numerical 
results in Section HJ). 

Let 0i, 0 2 ,..., 0 m be a basis for the finite element space Vm- For each 
0 < n < N, let u$ = J2h=i "Vfc be the finite element approximation 
of the solution u of the PDE given the parameter y n G [—|, |] s . Then 
for each 0 < n < N we solve the linear system 

(10) ... ,u$) B{y n ) = (? 1 ,? 2 ,---,?m) 

for ...,4 ) ), where g k = f D g(x) <j> k (x) dx for k = 1,..., M, 

and where the symmetric stiffness matrix B(y n ) = (b n ^/)k,e depends 
on y n and has entries 

(11) b n>k/ = [ a(x, y n ) V0 fc (x) • V0 £ (x) dx, 1 < k, t < M. 

J D 

Note that, with a slight abuse of notation, a(x, y n ) is given by (JHJ) but 
truncated to s terms. The expected value of the solution can then be 
approximated by 

N-1M-1 

(12) u(x) = — 

n =0 fc=l 

Due to the linear structure in (J8]) for the uniform case, we can write 

S 

&0 ,k,£ ^ ^ 0 71 iV, 1 ^ ^ M , 

3 =1 
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where y n . 3 denotes the jth component of the nth point y n , and 


a jt k,e = / i>j{x)X7(J)k(x) ■ V^(x) df, 1 <k,£< M, j > 0. 


D 


In the standard approach, one defines the symmetric matrices Aj = 
(dj,k/)i<k,e<M and sets 


S 


B(y n ) = A 0 + ^Vn,jAj, 0 <n< N. 


Note that Aj is usually sparse, with only 0(M ) nonzero entries in the 
same position, depending only on the relative supports of the basis 
functions (p k , which are thus in particular independent of j. The cost 
for computing B(y n ) for all n = 0,... ,N — 1 is therefore O(MNs) 
operations. 

The fast QMC matrix-vector approach is implemented as follows: 
let Y = (y n )o<n<N be the matrix whose rows are the quadrature 
points of the QMC rule, and let a k / = {ai,k,e, ■ ■ ■, a s ,k,i) T and bk/ = 
( b 0 ,k,e, ■ ■ ■, bN-i,k,e) T - Then compute 

(13) b kti = ( a 0 ,k,e , • • •, a 0>k ,e) T + Ya kti for all l <k,£< M, 

where each matrix-vector multiplication Ya k ^ should be done using 
the approach of the previous section. Since only 0(M ) vectors a k / are 
nonzero, this approach for obtaining B(y n ) for all n = 0,... ,N — 1 
therefore requires only 0(M N log N) operations. 

The improvement in the computational cost is that we replaced a 
factor of O(s) by C?(log N) (or O(logs) when N x s, see (J9|l). However, 
this method requires us to store all the vectors b k /- Using the sparsity 
of the stiffness matrices which is of O(M), we require O(MN) storage. 

3.3. Partial differential equations with “log-normal” random 
coefficients. We consider the PDE (j6]) again but now we assume that 
the random diffusion coefficient is given in parametric form by 



This formula arises from the assumption that the logarithm of the 
random coefficient a(x, •) is a Gaussian random field in the domain D, 
which is parametrized in terms of principal components of its covariance 
operator by a Karhunen-Loeve expansion. We shall refer to this case 
as the “log-normal” case. 

We may proceed as in the previous subsection, following (fT0]MfT2lh 
However, unlike the uniform case where linearity can be exploited, 
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the integral (fill) for the log-normal case generally cannot be solved 
explicitly so that we need to use a quadrature rule to approximate 
CD- Let x hk/ , x 2 ,fc,D • • •, € D denote the set of quadrature points 

and Wi : k t £, W 2 ,k,ii ■ ■ ■, u>i,k,e £ K. denote the corresponding quadrature 
weights which are used to approximate (1111) . 

i 

(14) ^n,fcy ~ bn,k,l ^ &(Xj : k,£i Un)^4*k(Xj t k,i) ' ^&i(Xi,k,£) 

i =1 

for 0 < n < N and 1 < k,£ < M. Let B(y n ) = {b n ,kj)k,t- Thus we 
need to compute 

(15) 

s 

&(Xi t k,£i Un) exp (Qi t k,t,n) •> @i,k,£,n ”0o(jh,fc,f) T ^ ^ VnJ Vh ("H,fc,l) i 

3 = 1 

for all l<i<I,0<n<N and 1 < k,£ < M such that V (ftk(xi,k,e) ■ 
V 4>e(xi } k,e) is nonzero. The number of these nonzero inner products is 
0(M ) since for a fixed k, the number of i such that the intersection 
of the supports of fa and (fte is nonempty does not depend on M. The 
standard approach to obtain B(y n ) for all n — 0,..., N — 1 therefore 
requires 0(1 M N s) operations. 

We now describe the fast approach. Let 


(@i,k,£,n) 1<*<7 j 
0 <n<N 


M)(zz,m)\ 

fa (Xi,k,i) 
\fa (Xi,k,e)/ 






f^i(xi,k,i)\ 
fa (xi, k ,i) 


\fps(xi,k,e)/ 


d r k,e — (^i,k,i, • • ■) ^i,k,e) G M JVxi 


d' k,t — (db t k,h ■ ■ ■, ’L/,fcy) £ M sX/ . 


Then (1151) can be written in matrix form as 
(16) ®k,e = ^k,e + Y'&kfr 


where the multiplication Y^k/ should be done as described in Sec- 
tion[2] Hence (1T5]) can be computed using the fast QMC matrix method 
and B(y n ) for all 0 < n < N can be computed in 0(1 M IV log N) op¬ 
erations. Again, the saving is that we replaced the factor 0(s) by a 
factor 0(logiV) in the computational cost. 
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4. Numerical experiments 

In this section we carry out numerical experiments for the three 
applications from the previous section. In all the numerical experiments 
in the paper the times are averaged from 5 independent runs using 
Matlab R2013b on an Intel®Core™ Xeon E5-2650v2 CPU @ 2.6GHz. 

Experiment 1: normally distributed points. We are interested 
in comparing the computation times using the standard approach of 
multiplying y n A for n = 0,1,... , iV — 1, and the fast QMC matrix 
approach described in Subsection 12.11 with lattice point sets. 

Table Q] shows the computation times in seconds for various values 
of N and s. The value on top shows the standard approach, whereas 
the value below shows the fast QMC matrix approach. For our exper¬ 
iments we chose /x = (0, 0,..., 0), and A a random upper triangular 
matrix with positive diagonal entries (so that A corresponds to the 
Cholesky factor of a random matrix £). The computation times do not 
include the component-by-component construction of the lattice gen¬ 
erating vectors, the computation of ci,...,c s (since this information 
can be obtained from the fast component-by-component construction), 
nor the computation of for n = 0,1 ,..., iV — 1 (since this 

computation is the same for both methods). 

The numerical experiments in Table [L] show that there is an advan¬ 
tage using the fast QMC matrix-vector product if the dimension is 
large and the advantage grows as the dimension increases. This is in 
agreement with the theory since the computational cost in the stan¬ 
dard approach is of order 0(Ns 2 ) operations, whereas in the fast QMC 
matrix-vector approach it is of order 0(s N log N) operations. Recall 
that the fast QMC matrix method incurs a storage cost of O(Ns). 

Experiment 2: the uniform case. We consider the ODE 
(17) 

-T (<i(x, y)Ai(j:, y)) — g(x) for x e ( 0 , 1 ) and y € [- 5 , |] M , 
u(x, y) = 0 for x = 0 , 1 , 

OO 

a(x, y) = 2 + ^ VjJ~ 3/2 sin( 2 vr jx). 

3 = 1 

Thus Ej>r II^IIl-Iop) < oo for any e > 0 , and (jUJ) implies that we 
should choose N x s. In our experiments we choose M = N = s. 

To obtain an approximation of the solution we use finite elements. 
Let Xk = k/M for k — 0,1,..., M and for k — 1, 2,..., M — 1 define 
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Method 

N 

s = 200 

s = 400 

s = 600 

5 = 800 

s = 1000 

std. 

fast 

16001 

0.309 

0.164 

0.741 

0.301 

1.296 

0.450 

1.617 

0.589 

2.154 

0.741 

std. 

fast 

32003 

0.589 

0.603 

1.468 

1.198 

2.435 

1.792 

3.063 

2.395 

4.238 

2.994 

std. 

fast 

64007 

1.167 

1.804 

2.970 

3.853 

4.921 

5.551 

6.001 

7.582 

8.349 

9.827 

std. 

fast 

127997 

2.579 

2.331 

5.889 

4.661 

9.490 

7.321 

11.891 

9.984 

16.818 

12.284 

std. 

fast 

256019 

4.279 

5.401 

11.105 

10.933 

17.646 

16.174 

23.115 

24.147 

33.541 

26.898 

std. 

fast 

512009 

8.885 

10.947 

23.368 

22.066 

31.942 

35.543 

48.059 

45.164 

66.378 

56.190 


TABLE 1. Times (in seconds) to generate normally dis¬ 
tributed points with random covariance matrix. The top row 
is the time required by using the standard approach, whereas 
the bottom row shows the time required using the fast QMC 
matrix-vector approach. 


the hat function 


(18) 

Then 




({x - x k -i)M 
< (a; fc+ i - x)M 

1 ° 


if x k -i < x <x k , 
if x k < x < x k+1 , 
otherwise. 


,k,e 


{ AM if k = £, 

-2 M if \k —£\ — 1, 
0 otherwise, 


and for j > 1 we have 


a 


j,k,£ 


= 


■-3/2 


sin(27 rjx)(j)' k (x)(f)[(x) dx 


M 


nj 


\Tm) sm (¥) 


M 2 


5/2 


7TJ 

M 2 


sm 


■K] 


5/2 


sm 


( 27rj \ 
M ) 

(S) 

(3) 


sm 


sm 


' wj(2k-l) \ 
M 

nj(2k-\-l) \ 

M J 




if k 
if £ 
if £ 


i, 

k — 1, 
k + 1, 


0 


otherwise. 
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M = s = 2N 

N 

67 

127 

257 

509 

1021 

2053 

4001 

8009 

16001 

std. 

fast 

1 

0.035 

5 

0.042 

31 

0.114 

190 

0.462 

1346 

1.562 

10610 

5.591 

74550 

19.678 

f»144h 

87.246 

?» 1000 h 

342.615 

M = s= [Viv] 

N 

67 

127 

257 

509 

1021 

2053 

4001 

8009 

16001 

std. 

fast 

0.066 

0.012 

0.164 

0.015 

0.474 

0.028 

1.272 

0.059 

3.570 

0.126 

10.813 

0.265 

30.127 

0.516 

89.42 

1.113 

273.873 

2.443 

s = N and M = N 2 

N 

67 

127 

257 

509 


std. 

fast 

6 

0.243 

82 

1.385 

1699 

11.268 

27935 

107.042 



TABLE 2. Times (in seconds) to obtain the average value 
of the finite element coefficients of the approximation (1121) 
to (El). Top: M = s = 2N. Middle: M = s = \VN]. 
Bottom: s = N and M = N 2 . 


Thus the matrices Aj and B(y n ) are tridiagonal. For simplicity we 
choose g such that • • •, ?m) = (1,1, ..., 1). 

Table [2] shows the computation times comparing the standard ap¬ 
proach with the fast QMC matrix method based on lattice point sets 
as described in Section 12.11 In this case the mapping in y n = Lp(x n ) 
is tp(x) = x — 1/2, since the lattice points need to be translated from 
the usual unit cube [0, l] s to [—|, ^] s . Note that we do not apply any 
random shifting as analyzed in im Since the dimension s is large, the 
fast QMC matrix method is very effective in reducing the computation 
times. Note that in Table [2] for the case M = s = 2N the times for 
the standard method for N = 8009 and N = 16001 are in hours and 
are estimated from extrapolating on previous values in the table. The 
experiments show there is a clear advantage of fast QMC matrix-vector 
approach especially for large values of M, N and s. 


Experiment 3: the log-normal case. In one space dimension, we 
consider the two-point boundary value problem for the parametric, 
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M = s = 2N 

N 

67 

127 

257 

509 

1021 

2053 

4001 

8009 

16001 

std. 

fast 

0.028 

0.040 

0.051 

0.033 

0.140 

0.094 

0.436 

0.326 

1.734 

1.122 

15.173 

4.296 

84.381 

15.203 

614.636 

60.546 

4391.2 

270.691 

M = s= fV^Vl 

N 

67 

127 

257 

509 

1021 

2053 

4001 

8009 

16001 

std. 

fast 

0.030 

0.132 

0.053 

0.036 

0.090 

0.052 

0.182 

0.106 

0.375 

0.228 

0.791 

0.480 

1.609 

0.940 

4.100 

2.670 

7.874 

4.597 

s = N and M = N 2 

N 

67 

127 

257 

509 

1021 


std. 

fast 

0.162 

0.204 

0.945 

1.084 

9.935 

10.154 

84.790 

83.861 

891.175 

746.907 



TABLE 3. Times (in seconds) to obtain the average value 
of the finite element coefficients of the approximation (1121) to 
(fl9l) . Top: M = s = 2 N. Middle: M = s = f \/iV ”]. Bottom: 
s = N and M = N 2 . 

second order ODE 
(19) 

UU ( a ^’ V ^} = for x 6 !lK 1 ' 1 alld V 6 (“1 if ’ 

u(x, y) = 0 for x = 0, 1 , 

( OO 

2 + Vi i~ 3/2 si n(2njx) 

3 =1 

We use M finite elements to construct the approximate solutions as 
in (1I8|) . To compute (fT4|) . we use an equal weight quadrature with M 
(so I = M) points. 

Table [3] shows the computation time for the log-normal case with 
different choices of number of finite elements M, the number of QMC 
points N and the truncated dimension s. As one would expect from the 
theory, the most significant advantage of the fast QMC matrix method 
occurs when 2 s is large compared to N, which is also reflected in the 
numerical results. 
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